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ENHANCING PERCEPTUAL PERFORMANCE OF HIGH FREQUENCY 
RECONSTRUCTION CODING METHODS BY ADAPTIVE FILTERING 



5 TECHNICAL FIELD 

The present invention relates to source coding systems utilising high frequency reconstruction (HFR) such as 
Spectral Band Replication, SBR [WO 98/57436] or related methods. It improves performance of bom high quality 
methods (SBR), as well as low quality copy-up methods [U.S. Pat. 5,127,054], It is applicable to both speech coding 
and natural audio coding systems. 



BACKGROUND OF THE INVENTION 

In high frequency reconstruction, where a low band is used to extrapolate a high band it is important to have means 
to control the harmonics of the replicated high band. This is necessary since the harmonics usually are stronger in the 
15 low band compared to the high band. An extreme example is a very pronounced harmonic series in the lowband and 
more or less pure noise in the high band. One way to approach this is by adding noise adaptively to the highband 
(Adaptive Noise Addition). However, this is sometimes not enough to suppress the tonal character of the lowband, 
giving the highband a repetitive **buzzy" sound character. Another problem occurs when two harmonic series are 
mixed, one with high harmonic density (low pitch) and the other with lower harmonic density (high pitch). If the 
20 high-pitched harmonic series dominates over the other in the lowband but not in the highband, the HFR causes the 
harmonics of the high-pitched signal to dominate the highband, causing the high band to sound "metallic" compared 
..to the original. In high frequency reconstruction methods such as SBR, it is possible to individually control the level 
of the generated harmonic series, i.e. second order harmonics or third order harmonics. This is however not enough 
to correct the problems described above. In other implementations a constant high order of spectral whitening is 
25 introduced during the spectral envelope adjustment of the HFR signal. This solves the above-described problems, but 
introduces new artifacts on other signal excerpts that do not benefit from the high order spectral whitening. 



SUMMARY OF THE INVENTION 

30 The present invention relates to the problem of "buzziness" and t4 metallic ,, -sound that is sometimes introduced in 
HFR-methods. It uses a detection algorithm on the encoder side in order to estimate the preferable amount of 
spectral whitening to be applied in the decoder. The spectral whitening varies over time as well as over frequency, 
ensuring the best possibilities to control the harmonic contents of the replicated high band. The present invention 
can be carried out in a time-domain implementation as well as in a subband interbank implementation. It uses linear 

35 prediction with variable predictor order as well as variable bandwidth expansion factor. 
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The present invention comprises the following features: 

In the encoder, estimation of the preferred degree of spectral whitening to use in the decoder, at a given time 
and in a given frequency band, 
5 In the decoder, perform spectral whitening in either the time domain or in a subband interbank, in accordance 

with the whitening information transmitted from the encoder. 
The whitening in the decoder is obtained using linear prediction. 

The degree of whitening is controlled by varying the predictor order, or by varying the bandwidth expansion 
factor. 

10 - In a subband filterbank: low-order predictors, effective implementation, especially in a system where a 
filterbank already is used for envelope adjustment. 

BRIEF DESCRIPTION OF THE DRAWINGS 

1 5 The present invention will now be described by way of illustrative examples, not limiting the scope or spirit of the 
invention, with reference to the accompanying drawings, in which: 

Fig. 1 illustrates bandwidth expansion of an LPC spectrum. 

Fig. 2a illustrates a worst case signal according to the present invention. 

Fig. 2b illustrates the autocorrelation for the highband and lowband of the worst case signal. 
20 Fig. 3 illustrates a time domain implementation of the adaptive whitening in the decoder. 

Fig. 4 illustrates a subband filterbank implementation of the adaptive whitening in the decoder. 

Fig. 5 illustrates an encoder implementation of the present invention. 

Fig. 6 illustrates a decoder implementation of the present invention. 

25 

DESCRIPTION OF PREFERRED EMBODIMENTS 

The below-described embodiments are merely illustrative for the principles of the present invention for improvement 
of high frequency reconstruction systems. It is understood that modifications and variations of the arrangements and 
the details described herein will be apparent to others skilled in the art. It is the intent, therefore, to be limited only by 
30 the scope of the impending patent claims and not by the specific details presented by way of description and 
explanation of the embodiments herein. 

Linear prediction based spectral whitening 

When adjusting a spectral envelope of a signal to a given spectral envelope a certain amount of whitening is always 
35 applied. This since if the desired spectral envelope is described by i/ envRcf (/) and the spectral envelope of the 
current signal segment H^mif) * e filter f^ 1110 * 011 applied is 
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H{f) = H «* M (f} Eq. 1. 

EnvCur (/) 

It is however not necessary to have the same frequency resolution for H^ K ^ (/) as for H mvCljr (/) . The current 
invention uses adaptive frequency resolution of J/envCur (/) for envelope adjustment of HFR signals. The variable 
frequency resolution is obtained by using LPC of different order and with different bandwidth expansion factors for 
5 different frequency bands, in the time domain or in the filterbank domain, as will be explained and outlined below. 

Assume the input signal segment can be represented by a time varying digital filter whose steady-state system 
function is of the form 

10 where U(z) is the excitation signal (assumed to be a pulse train or noise) and S(z) is the current input signal 
segment. The input signal s(n) to the HFR generator in the decoder can be predicted according to 
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Xn) = £a k s(n-k) Eq.3. 



s ^ 



The prediction error can be defined as 

e(n) = s(n) - J(n) = s(n)-^a k s(n-k) Eq. A. 

Hence the prediction error sequence is the output of a system whose transfer function is 

A{z) = l-£a k z- k Eq.5. 
This gives an estimate of the system transfer function 

A(z) 

20 In order to spectrally whiten the signal segment it can be filtered with the inverse of H (z) . The degree of 

whitening can be controlled by varying the predictor order, hrniting the order of the polynomial A(z) , thus hrniting 
the amount of fine structure that can be described by H(z) , or by applying a bandwidth expansion factor to the 
polynomial A(z) . If the bandwidth expansion factor is p , the polynomial A{z) evaluates to 
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A(pz) = a 0 z°p° + a x z x p x +a 2 z 2 p 2 +... + a p z p p p Eq.7. 



This expands the bandwidth of the formants estimated by H(z) according to Fig, 1 . Thus the inverse filter at a 
given time can be described as 

H^iz, p, p) = ^ Eq.8. 

The coefficients CC k can be obtained in different manners, e.g. the autocorrelation method or the covariance method 
[Digital Signal Processing, principles, algorithm and applications, Proakis & Manolakis, Prentice Hall]. The gain 
factor G can be set to one if the H inv is used as a pre- whitening filter followed by a regular envelope adjustment. It 
is common practice to add some sort of relaxation to the estimate in order to ensure stability of the system. When 
using the autocorrelation method this is easily accomplished by offsetting the zero-lag value of the correlation 
vector. This is equivalent to addition of white noise to the signal used to estimate A{z) . The parameters p and p 
are calculated based on information transmitted from the encoder. 

An alternative to bandwidth expansion is described by: 

A b (Z) = l-b + bA(z) Eq.9, 
where b is the whitening factor. This yields the whitening filter according to: 

i_fc +M i_£ at(z) -*) 

H^p.b) = ^ Eq. 10. 

Here it is evident that for b = 1 Eq. 10 evaluates to Eq. 8 with p = 1 , and for b — 0 Eq. 10 evaluates to a constant 
non frequency selective gain factor. 

The detector on the encoder side 

In the encoder a detector is used to asses the best degree of whitening (LPC order, bandwidth expansion factor and 
whitening factor) to be used in the decoder in order to obtain a highband as similar to the original given the currently 
used HFR method. Several approaches can be used in order to obtain a proper estimate of the degree of whitening to 
be used in the decoder. These include e.g. analysis by synthesis. The detector can be made to estimate the most 
suitable order of the LPC used in the decoder and different bandwidth expansion factors for different frequency 
bands. 

One approach uses auto correlation to estimate the appropriate amount of whitening. The detector estimates the 
autocorrelation functions for the source range and the target range, Fig 2a. Here a worst case signal is described, a 
harmonic series in the low band and white noise in the highband. The different autocorrelation functions are 
displayed in Fig 2b. Here it is evident that the lowband is highly correlated whilst the highband is not. The 
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maximum correlation for all lags larger than a minimum lag, are obtained for both the highband and the lowband. 
The quotient of the two is used to calculate the optimal bandwidth expansion factor. In the current implementation 
of the above outlined system, it is preferable to use FFT:s for the computation of the correlation. The autocorrelation 
of a sequence x(n) is defined by: 

r xx (m) = FFT- , <\X(kf) Eq. 11, 

where 

X(k) =FFT(x(n)) Eq. 12. 

Since it is the objective to compare the difference of the auto correlation in the highband and the lowband the 
filtering can be done in the frequency domain. This yields: 

\X Hp {k)=X(k) H Hp (k) 
where H ^ (k) and H Hp (k) are the Fourier transforms of the LP and HP filters impulse responses. From the 
above the auto correlation functions for the lowband and highband can be calculated according to: 

^(«)=^-'(|^<*)| 2 ) 
r JxHp (m)=FFT- l (\X ffp (k)\ 1 ) 
The maximum value, for a lag larger than a minimum lag, for each autocorrelation vector is calculated: 

f>W -maxCr^) Vm>minLag ^ ^ 

\ r MaxHp = Vm>minLag 
The quota can be used to for instance linearly map to a suitable bandwidth expansion factor. 



A different detector approach is to obtain an estimate of the harmonic to non-harmonic signal ratio H in each 
subband of a complex filter bank by using very low order linear prediction for each block of subband samples. The 
energy of the predicted block of subband samples divided by the total energy of the block gives an estimate of H as 
a function of both time and frequency. The difference between highband and lowband values of H is then used to 
25 adjust the degree of whitening such that the harmonic to signal ratio of the synthetically generated highband 

approaches that of the original highband. Here it is advantageous to control the degree of whitening utilising the 
whitening factor b (Eq. 9). 

Adaptive LPC-based whitening ™ the tim e domain 
30 When performing whitening in the time domain the auto correlation method is preferred. The auto correlation 

method requires windowing of the input segment used to estimate the coefficients Ct k , this is not the case for the 
covariance method. The filter used for the spectral whitening is 
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H. ay {z,p,p) = \-^JX k {z P Y k Eq.16, 

here the gain factor G is set to one. It's beneficial to whiten the signal in the low band prior to the HFR generator, 
since the whitening thus can operate on a lower sampling rate. The lowband signal is windowed and whitened on a 
suitable time base with the predictor order and bandwidth expansion factors given by the encoder, according to Fig. 
5 3. In the current implementation of the present invention the signal is low pass filtered 301 and decimated 302. 303 
illustrate the spectral whitening. A window 306 is used to select the proper time segment for estimation of the A{z) 
polynomial 307, 50% overlap is used. The LPC-routine extracts A(z) given the currendy preferred LPC-order and 
bandwidth expansion factor, with a suitable relaxation. A FIR filter 308 is used to inverse filter the signal segment. 
The whitened signal segments are upsampled 304, 305 and windowed together forming the input signal to the HFR 
10 unit. 

Adaptive LPC-based whitening in a subband filter bank 

The whitening can be performed effectively and robustly by using a complex filter bank. The linear prediction and 
the inverse filtering are then done independently for each of the subband signals produced by the filter bank. The 
output from a complex filter bank used for the analysis of a real valued signal can be interpreted as a set of subband 
1 5 signals that are oversampled a factor of two in the frequency domain. This feature is exploited to heavily reduce 

artifacts due to aliasing emerging from independent modifications of the subband signals, which for example inverse 
filtering results in. 

The whitening of the subband signals is obtained through linear prediction analogous to the rime domain method 
20 described above. The subband signals are however complex valued, so complex filter coefficients are used for the 
linear prediction as well as for the inverse filtering. The order of the linear prediction can be kept very low since the 
expected number of harmonic components in each frequency band is very small. In order to correspond to the same 
time base as the time domain LPC, the number of subband samples in each block is smaller by a factor equal to the 
downsampling of the filter bank. Given the low filter order and small block sizes the prediction filter coefficients are 
25 obtained from the covariance method. Although no windowing of subband signal blocks is performed prior to the 

computation of covariance estimates, the inner products are weighted by a positive valued window w(n). The matrix 
elements involved for a block size of N samples for the subband signal corresponding to channel / are thus: 

*ftt) = IjCj(»-0*/(»-*M4 0<iZp, \<k<p. Eq. 17 

30 Here, x } (n) is the subband signal, */* (/i) is its complex conjugate, and p is the order of linear prediction. The 

purpose of this internal windowing is mainly to enhance the whitening of tonal signals with time varying frequency 

content. The coefficients obtained from these covariances are used to filter the subband signal block 

[x f (0)>x t (1),..., X; (N - 1)] . Filter coefficient calculation and whitening can be performed on a block by block basis 
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using subband sample time step L , which is smaller than the block length N. The whitened blocks should be added 
together using appropriate synthesis windowing. 

Feeding a maximally decimated interbank with an input signal consisting of white gaussian noise will produce 
5 subband signals with white spectral density. In a complex filter bank, the subband signals are oversampled by a 
factor of two as mentioned above. Feeding the oversampled filterbank with white noise gives subband signals with 
coloured spectral density. This is due to the effects of the frequency responses of the analysis filters. The LPC 
predictors in the filterbank channels will track the filter characteristics in the case of noise-like input signals. This is 
an unwanted feature, and benefits from compensation. A possible solution is pre-filtering of the input signals to the 

10 linear predictors. The pre- filtering should be an inverse, or an approximation of the inverse, of the analysis filters, in 
order to compensate for the frequency responses of the analysis filters. The whitening filters are fed with the original 
subband signals, as described above. Fig. 4 illustrates the whitening process of a subband signal. The complex- 
valued signal corresponding to channel / is fed to the pre-filteringblock 40 1 , and subsequenctly to a delay chain 
which depth depends on the filter order 402. The delayed signals and their conjugates 403 are fed to the linear 

15 prediction block 404, where the complex-valued coefficients are calculated. The coefficients from every L:th 

calculation are kept by the decimator 405. The subband signal are finally filtered through the filterblock 406, where 
the predicted coefficients are used and updated for every L:th samples. 

Practical implementations 

20 The present invention can be implemented in both hardware chips and DSPs, for various kinds of systems, for 
storage or transmission of signals, analogue or digital, using arbitrary codecs. Fig. 5 and Fig. 6 shows a possible 
implementation of the present invention. In Fig.5 the encoder side is displayed. The analogue input signal is fed to 
the A/D converter 501, and to an arbitrary audio coder, 502, as well as the inverse filtering level estimation unit 503, 
and an envelope extraction unit 504. The coded information is multiplexed into a serial bitstream, 505, and 

25 transmitted or stored. In Fig. 6 a typical decoder implementation is displayed. The serial bitstream is de-multiplexed, 
601, and the envelope data is decoded, 602, i.e. the spectral envelope of the high-band. The de-multiplexed source 
coded signal is decoded using an arbitrary audio decoder, 603. The decoded signal is fed to the spectral whitening 
unit 604, which performs the adaptive spectral whitening. The spectrally whitened signal is fed to an arbitrary HFR 
unit, 605, and to the envelope adjuster 606. The output from the envelope adjuster is combined with the decoded 

30 signal fed through a delay, 607. Finally, the digital output is converted back to an analogue waveform 608. 



CLAIMS 

1. A method for enhancement of source coding systems using high-frequency reconstruction, where said source 
coding system comprises an encoder representing all operations performed prior to storage or transmission, and a 

5 decoder representing all operations performed after storage or transmission, characterised by: 

at said encoder, estimating the required amount of spectral whitening at a given time and frequency, and 

transmitting information on said amount of spectral whitening from said encoder to said decoder; 

at said decoder, adaptively, spectrally whiten a signal prior to HFR or after HFR, according to the whitening 

information obtained from said encoder. 

10 

2. A method according to claim 1, characterised in that said spectral whitening is performed in the time domain. 

3. A method according to claim 1, characterised in that said spectral whitening is performed in a subband filterbank 

15 4. A method according to claim 1, characterised in that said estimation of required amount of spectral whitening is 
done using analysis by synthesis. 

5. A method according to claim 1, characterised in that said estimation of required amount of spectral whitening is 
done by comparative study of the autocorrelation in the lowband and the highband. 

20 

6. A method according to claim 1, characterised in that the amount of spectral whitening is controlled by the LPC 
predictor order. 

7. A method according to claim 1, characterised in that the amount of spectral whitening is controlled by the 
25 bandwidth expansion factor of the LPC polynomial. 

8. A method according to claim 1, characterised in that the amount of spectral whitening is controlled by the 
whitening factor b. 

30 9. A method according to claim 3, characterised in that different bandwidth expansion factors are used for different 
fiherbank bands. 

10. A method according to claim 3, characterised in that pre- filtering is included in the LPC estimation in order to 
compensate for the characteristic of the filterbank analysis filters. 

35 
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11. An apparatus for enhancement of source coding systems using high-frequency reconstruction, where said 
apparatus comprises a decoder, for decoding a coded signal encoded by an encoder, characterised by: 

means for estimating the optimal amount of spectral whitening at a given time and frequency, in said encoder; 

means for adaptively, spectrally whiten a signal prior to HFR or after HFR, in said decoder according to the 
5 whitening information obtained from said encoder. 
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ABSTRACT 

The present invention proposes new methods and an apparatus for enhancement of source coding systems utilising 
high frequency reconstruction (HFR). It uses a detection algorithm on the encoder side in order to estimate the 
correct amount of spectral whitening to be applied in the decoder. The spectral whitening varies over time as well as 
over frequency, ensuring the best possibilities to control the harmonic contents of the replicated high band. The 
present invention is applicable to both speech coding and natural audio coding systems. 
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